The Mathematics of Statistical Machine Translation: Parameter Estimation
نویسندگان
چکیده
We describe a series o,f five statistical models o,f the translation process and give algorithms,for estimating the parameters o,f these models given a set o,f pairs o,f sentences that are translations o,f one another. We define a concept o,f word-by-word alignment between such pairs o,f sentences. For any given pair of such sentences each o,f our models assigns a probability to each of the possible word-by-word alignments. We give an algorithm for seeking the most probable o,f these alignments. Although the algorithm is suboptimal, the alignment thus obtained accounts well for the word-by-word relationships in the pair o,f sentences. We have a great deal o,f data in French and English from the proceedings o,f the Canadian Parliament. Accordingly, we have restricted our work to these two languages; but we,feel that because our algorithms have minimal linguistic content they would work well on other pairs o,f languages. We also ,feel, again because of the minimal linguistic content o,f our algorithms, that it is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus.
منابع مشابه
Statistical Machine Translation : Robust parameter estimation from noisy corpus
In this report, we describe our study of effect of noise on parameter estimation for statistical machine translation. So far, no study has been done on this topic, even though the algorithm used for parameter estimation for statistical machine translation (the EM algorithm) is known to be highly sensitive to noise. We present in detail the experiments performed to observe the influence of noise...
متن کاملAn English-Assamese Machine Translation System
Al-Onaizan,Y. et-al, "Distortion models for statistical machine translation" , In Proceedings of ACL-COLING, July 2006,pp. 529 . . 536. Birch, A. et-al, "Constraining the phrase-based,joint probability statistical translation model", In Proceedings of HLTNAACL Workshop on Statistical Machine Translation, April 2006, pp 154 . . 157. Brown, P. F. et-al, "The mathematics o...
متن کاملFinite-state transducer-based statistical machine translation using joint probabilities
In this paper, we present our system for statistical machine translation that is based on weighted finite-state transducers. We describe the construction of the transducer, the estimation of the weights, acquisition of phrases (locally ordered tokens) and the mechanism we use for global reordering. We also present a novel approach to machine translation that uses a maximum entropy model for par...
متن کاملTransductive Minimum Error Rate Training for Statistical Machine Translation
This paper investigates parameter adaptation in Statistical Machine Translation(SMT). To overcome the parameter bias-estimation problem with Minimum Error Rate Training(MERT), we extend it under a transductive learning framework, by iteratively re-estimating the parameters using both development and test data, in which the translation hypotheses of the test data are used as pseudo references. F...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 19 شماره
صفحات -
تاریخ انتشار 1993